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A  bstract 

The  Markov  chain  approximation  method  is  a  widely  used,  relatively 
easy  to  use,  and  efficient  family  of  methods  for  the  bulk  of  stochastic  con¬ 
trol  problems  in  continuous  time,  for  reflected-jump-diffusion  type  models. 
It  has  been  shown  to  converge  under  broad  conditions,  and  there  are  good 
algorithms  for  solving  the  numerical  problems,  if  the  dimension  is  not  too 
high.  We  consider  a  class  of  stochastic  differential  games  with  a  reflected 
diffusion  system  model  and  ergodic  cost  criterion  and  where  the  controls 
for  the  two  players  are  separated  in  the  dynamics  and  cost  function.  It  is 
shown  that  the  value  of  the  game  exists  and  that  the  numerical  method 
converges  to  this  value  as  the  discretization  parameter  goes  to  zero.  The 
actual  numerical  method  solves  a  stochastic  game  for  a  finite  state  Markov 
chain  and  ergodic  cost  criterion.  The  essential  conditions  are  nondegener¬ 
acy  and  that  a  weak  local  consistency  condition  hold  “almost  everywhere” 
for  the  numerical  approximations,  just  as  for  the  control  problem. 


1  Introduction 

The  Markov  chain  approximation  method  of  [19,  20,  22]  is  a  widely  used  method 
for  the  numerical  solution  of  virtually  all  of  the  standard  forms  of  stochastic 
control  problems  with  reflected-jump-diffusion  models.  It  is  robust  and  can 
be  shown  to  converge  under  very  broad  conditions.  Extensions  to  approxima¬ 
tions  for  two-person  differential  games  with  discounted,  finite  time,  stopping 
time,  and  pursuit-evasion  games  were  given  in  [18]  for  reflected  diffusion  models 
where  the  controls  for  the  two  players  are  separated  in  the  dynamics  and  cost 
rate  functions.  In  this  paper,  the  basic  ideas  will  be  extended  to  two-player 
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stochastic  dynamic  games  with  the  same  systems  model,  but  where  the  cost 
function  is  ergodic.  Such  ergodic  and  “separated”  models  occur,  for  example, 
in  risk-sensitive  and  robust  control  [2,  3,  7,  15].  In  fact,  the  game  formulation 
of  risk  sensitive  control  problems  for  queues  in  heavy  traffic  was  our  original 
motivation. 

When  the  robust  control  is  for  controlled  queues  in  heavy  traffic,  then  the 
state  is  confined  to  some  convex  polyhedron  by  boundary  reflection  [21].  In 
many  other  applications,  the  state  of  the  physical  problem  is  confined  to  a 
bounded  set.  One  example  is  the  heavy  traffic  limit  of  controlled  queueing 
networks  with  finite  buffers  [1,  21]  or  robust  control  of  such  systems  as  in  [2,  3], 
where  the  set  is  a  hyperectangle.  Then  robust  control  would  lead  to  a  game 
problem  with  a  hyperrectangular  state  space.  If  the  system  state  is  not  a  priori 
confined  to  a  bounded  set,  then  for  numerical  purposes  it  is  commonly  necessary 
to  bound  the  state  space  artificially  by  adding  a  reflecting  boundary  and  then 
experimenting  with  the  bounds.  Our  systems  model  is  confined  to  a  state  space 
G  that  is  a  convex  polyhedron,  and  it  is  confined  by  a  “reflection”  on  the 
boundary.  More  generally,  the  boundaries  could  be  determined  by  a  set  of 
smooth  curved  surfaces  as  in  [22],  but  we  restrict  attention  to  the  polyhedral 
case,  since  that  is  the  most  common  and  it  avoids  minor  details  which  can  be 
distracting. 

There  are  many  results  for  various  forms  of  the  game  problem;  e.g.,  [4,  5, 
6,  24,  28,  29].  But  there  seems  to  be  nothing  available  concerned  with  the  er¬ 
godic  problem  for  the  reflected  diffusion  model.  We  will  use  purely  probabilistic 
methods  of  proof.  Such  methods  have  the  advantage  of  providing  intuition  con¬ 
cerning  numerical  approximations,  they  cover  many  of  problem  formulations  to 
date,  and  they  converge  under  quite  general  conditions.  The  essential  condi¬ 
tions  are  weak-sense  existence  and  uniqueness  of  the  solution  to  the  controlled 
equations,  “almost  everywhere”  continuity  of  the  dynamical  and  cost  rate  terms, 
and  a  natural  “local  consistency”  condition:  The  local  consistency  and  continu¬ 
ity  need  hold  only  almost  everywhere  with  respect  to  the  measure  of  the  basic 
model,  hence  discontinuities  in  the  dynamics  and  cost  function  can  be  treated 
under  appropriate  conditions  (see,  in  particular  the  treatment  of  discontinuities 
and  complex  variational  problems  with  singularities  and  Theorems  4.6  and  7.1  in 
[22]).  Furthermore,  the  numerical  approximations  are  represented  as  processes 
which  are  close  to  the  original,  which  gives  additional  intuitive  and  practical 
meaning  to  the  method. 

The  methods  to  be  used  for  the  ergodic  cost  function  are  quite  different  than 
those  used  in  [18].  They  share  the  foundation  in  the  theory  of  weak  convergence 
[9,  13].  But  they  depend  heavily  on  the  approximations  to  the  ergodic  cost  con¬ 
trol  problem  as  developed  in  [21,  Chapter  4].  The  development  of  the  paper  has 
been  structured  to  take  advantage  of  the  results  in  [21,  22],  wherever  possible. 
To  facilitate  the  development.  Subsection  2.2  summarizes  the  results  from  [21] 
which  will  be  needed  here,  with  an  occasional  change  of  notation  to  suit  that 
used  here. 

Subsection  2.1  defines  the  basic  systems  model,  where  the  control  is  in¬ 
troduced  via  the  Girsanov  transformation  [17].  The  dynamical  model  is  the 
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reflected  stochastic  differential  equation  (2.4),  also  called  the  Skorohod  problem 
[12,  21,  22],  The  conditions  on  the  boundary  of  the  state  space  are  (A2.1)- 
(A2.2).  Condition  (A2.1)  covers  the  great  majority  of  cases  of  current  interest, 
including  those  that  arise  from  queueing  and  communications  networks.  The 
condition  is  obvious  when  the  state  space  is  a  hyperrectangle  with  reflection  di¬ 
rections  being  the  interior  normals.  The  strategies  of  the  players  are  as  follows. 
Player  1  wishes  to  minimize  and  player  2  to  maximize.  For  the  infsup  problem 
(the  upper  value),  at  the  start  of  the  game  (i.e.,  at  t  =  0)  player  1  selects  a 
control.  This  can  be  either  a  pure  (and  time  independent)  feedback  control  or 
a  relaxed  feedback  control  (see  Subsection  2.1  for  the  definition).  The  selected 
control  will  be  used  at  all  t  >  0.  Then  player  2  selects  its  strategy.  This  can 
be  either  a  relaxed  feedback  or  a  classical  relaxed  control.  Whatever  it  is,  once 
selected,  it  cannot  be  changed. 

The  situation  is  analogous  if  player  2  selects  first.  Since  the  controls  for  the 
player  who  chooses  first  are  time  independent  feedback  and  these  are  selected 
and  fixed  at  the  start  of  the  game,  and  only  the  player  choosing  last  can  use 
time  dependent  controls,  complications  due  to  the  notions  of  strategy  in  the 
time  dependent  case  (e.g.,  concerning  the  definition  of  the  value  either  via  a 
limit  of  a  discrete  time  game,  or  via  the  Elliott-Kalton  definition)  do  not  arise. 
In  this  sense  the  paper  is  simpler  than  [18].  On  the  other  hand,  the  treatment 
of  the  ergodic  cost  criterion  adds  substantial  new  complications.  Subsection  2.3 
establishes  the  existence  of  the  controls  yielding  the  upper  and  lower  values, 
using  approximation  methods  from  [21]. 

The  Markov  chain  approximation  numerical  method  is  discussed  in  Subsec¬ 
tion  3.1.  The  methods  for  getting  the  approximating  chain  and  cost  function 
are  the  same  as  in  [22]  for  the  pure  control  problem,  since  it  is  the  process  for 
arbitrary  controls  that  is  approximated.  The  natural  local  consistency  condition 
is  stated.  The  proof  of  convergence  of  the  numerical  method  is  in  Subsection 
3.2  and  depends  on  the  fact  that  the  original  game  has  a  value.  The  numerical 
approximations  are  games  for  Markov  chains.  They  might  or  might  not  have 
a  value,  depending  on  the  form  of  the  approximation.  But,  it  it  seen  that  the 
upper  and  lower  values  converge  to  the  value  of  the  original  game  as  the  approx¬ 
imation  parameter  goes  to  its  limit.  Finally,  the  proof  that  the  original  game 
has  a  value  is  given  in  Section  4. 

2  The  Dynamical  Model  and  Background  Re¬ 
sults 

2.1  Assumptions  and  the  Dynamical  M  odel 


Assumptions.  The  first  assumptions  define  the  state  space  G. 

A  2.1.  The  state  space  G  is  the  intersection  of  a  finite  number  of  closed  half 
spaces  in  Euclidean  r-space  FT" ,  and  is  the  closure  of  its  interior  {i.e.,  it  is  a 
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closed  convex  polyhedron  with  an  interior  and  planar  sides).  Let  dGi,  i  =  1,. . . , 
denote  the  faces  of  G,  and  Ui  the  interior  normal  to  dGi.  Interior  to  dGi,  the 
reflection  direction  is  denoted  by  the  unit  vector  di,  and  {di,ni)  >  0  for  each  i. 
The  possible  reflection  directions  at  points  on  the  intersections  of  the  dGi  are 
in  the  convex  hull  of  the  directions  on  the  adjoining  faces.  Let  d{x)  denote  the 
set  of  reflection  directions  at  the  point  x  G  dG,  whether  it  is  a  singleton  or  not. 
No  more  than  r  constraints  are  active  at  any  boundary  point. 

A  2.2.  For  each  x  G  dG,  define  the  index  set  I(x)  =  {i  :  x  G  dGi\.  Suppose 
that  X  G  dG  lies  in  the  intersection  of  more  than  one  boundary;  that  is,  I{x)  has 
the  form  I{x)  =  {ii, . . .  ,ik\  for  some  k  >  1.  Let  N{x)  denote  the  convex  hull 
of  the  interior  normals  Ui^ , . . . ,  to  dGi^ ,  ■  •  ■ ,  ,  respectively,  at  x.  Then, 

there  is  some  vector  v  G  N{x)  such  that  j'v  >  0  for  all  7  G  d{x). 

There  is  a  neighborhood  N{dG)  and  an  extension  of  d{  )  to  N{dG)  that  is 
upper  semicontinuous  in  the  following  sense:  For  each  e  >  0,  there  is  p  >  0  that 
goes  to  zero  as  e  ^  0  and  such  that  if  x  G  N {dG)  —  dG  and  distance{x,  dG)  <  p, 
then  d{x)  is  in  the  convex  hull  of  the  directions  {d{v);v  G  dG,  distance{x,v)  < 

e}- 

Let  a  =  {ai,a2),ai  G  Ui,a2  G  U2,  denote  the  canonical  control  value,  with 
ai  the  canonical  value  for  player  i. 

A  2.3.  The  Ui,i  =  1,2,  are  compact  sets  in  some  Euclidean  space.  The  (r  x  r) 
matrix-valued  function  a(-)  on  G  is  Holder  continuous,  with  a~^{x)  bounded, 
and  the  IT-valued  functions  bi{-)  on  G  x  Ui  are  continuous. 

The  uncontrolled  model  is  the  solution  to  the  Skorohod  problem 

dx{t)  =  a{x{t))dw{t)  +  dz{t),  x{t)  G  G.  (2.1) 

By  a  solution  to  (2.1)  we  mean  the  following.  Let  LI  denote  the  path  space  of 
{x{-),  z{-),w{-)),  and  let  {Ft,t  <  00)}  denote  the  filtration  on  the  space.  The 
a;(-)  and  z(-)  are  lR’'-valued,  continuous  and  iF^-adapted,  and  w{-)  is  an  Ft- 
standard  lR’'-valued  Wiener  process.  The  z{-)  is  the  reflection  process.  Let  LIt 
denote  the  restriction  of  LI  to  functions  defined  on  [0,T].  Define  F  =  lim^ 
and  let  denote  the  measure  when  the  initial  condition  is  a:(0)  =  x,  with  Flj, 
the  associated  expectation.  Let  Px,t{‘)  denote  the  probability  measure,  when 
we  confine  our  interest  to  paths  on  the  finite  interval  [0,T]. 

The  controlled  system  will  be  defined  via  the  Girsanov  transformation,  start¬ 
ing  with  (2.1).  For  a  detailed  discussion  of  the  Skorohod  problem  and  the  as¬ 
sumptions  (A2.1)  and  (A2.2),  see  [21,  Chapter  3].  See  also  the  brief  comment 
below  (A2.4).  We  will  also  need  the  following  condition. 

A  2.4.  There  is  a  unique  weak  sense  solution  to  (2.1)  for  each  initial  condition. 

Comments  on  (A2.1)  and  (A2.2).  One  can  always  construct  the  extension 
in  (A2.2).  To  see  that  (A2.1)  is  natural  in  application  note  the  following.  If  the 
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state  space  is  being  bounded  for  purely  numerical  reasons,  then  the  reflections 
are  introduced  only  to  give  a  compact  set  G,  which  should  be  large  enough 
so  that  the  effects  on  the  solution  in  the  region  of  main  interest  are  small.  A 
common  choice  is  a  hyperrectangle  with  normal  reflection  directions,  in  which 
case  the  right  side  of  (2.1)  is  zero.  Next,  consider  a  queueing  network  model 
in  the  heavy  traffic  limit  [16,  21,  27]  where  the  state  space  is  the  nonnegative 
orthant,  and  the  probability  that  an  output  of  the  Ah  processor  goes  to  the  jth 
processor  is  qij.  If  the  spectral  radius  of  the  routing  matrix  Q  =  is 

less  than  unity,  then  all  customers  will  eventually  leave  the  system.  The  model 
is  a  special  case  of  (2.4)  with  z{t)  =  [/  —  Q']y{t),  where  yi{-)  is  nondecreasing, 
continuous,  and  can  increase  only  at  t  where  Xi{t)  =  0.  The  condition  (A2.1) 
implies  (see  [12,  21])  the  so-called  “completely-S"’  condition  [16,  21,  26]  which 
is  used  to  ensure  that  z(-)  has  bounded  variation  w.p.l. 


Classes  of  controls.  A:  Relaxed  controls  n]-).  Suppose  that  for  some 
filtration  <  oo}  and  standard  vector-valued  iFt-Wiener  process  w{-),  each 

ri{-),i  =  1,2,  is  a  measure  on  the  Borel  sets  of  Ui  x  [0,  oo)  such  that  ri{Ui  x 
[0,t])  =  t  and  ri{A  x  [0,t])  is  -measurable  for  each  Borel  set  A  C  Ui.  Then 
ri{-)  is  said  to  be  an  admissible  relaxed  control  for  player  i,  with  respect  to 
w{-).  If  the  Wiener  process  and  filtration  have  been  given  or  are  obvious  or 
unimportant,  then  we  simply  say  that  ri(-)  is  an  admissible  relaxed  control  for 
player  i  [14,  21,  22].  For  Borel  sets  A  C  Ui,  we  will  write  ri{A  x  [0,  t])  =  ri{A,  t). 

For  almost  all  (w,<)  and  each  Borel  A  C  Ui,  one  can  dehne  the  derivative 


n,t{Ai)  =  lim 

<5^0 


ri{A,t) 


ri{A,t-  6) 
6 


Without  loss  of  generality,  we  can  suppose  that  the  limit  exists  for  each 
Then  for  all  (w,t),  ri^t{‘)  is  a  probability  measure  on  the  Borel  sets  of  Ui  and 
for  any  bounded  Borel  set  B  in  Ui  x  [0,oo), 

niB)=  /  I{(ai,t)(^B}ri,t{dai)dt. 

Jo  Ju\ 

An  ordinary  control  Ui{-)  can  be  represented  in  terms  of  the  relaxed  control  ri(-), 
defined  by  its  derivative  ri^t{A)  =  lA{ui{t)),  where  lA{ui)  is  unity  if  Uj  G  A 
and  is  zero  otherwise.  The  weak  topology  [22]  will  be  used  on  the  space  of 
admissible  relaxed  controls.  Relaxed  controls  are  commonly  used  in  control 
theory  to  prove  existence  theorems,  since  any  sequence  of  relaxed  controls  has 
a  convergent  subsequence. 


B:  Relaxed  feedback  control  m\{-)  [10,  21],  Suppose  that  mi(x,  •)  A  =  1.2, 
is  a  probability  measure  on  the  Borel  sets  of  Ui  for  each  x  G  G  and  that  mi{-,A) 
is  Borel  measurable  for  each  Borel  set  A  C  U^.  Then  we  say  that  mi{-)  is  a  re¬ 
laxed  feedback  control.  Define  U  =  UiX  U2.  For  relaxed  feedback  controls  mi{-), 
define  m(-)  by  m{x,da)  =  mi{x,dai)m2{x,da2).  Then  m(-)  is  also  a  relaxed 


5 


feedback  control,  but  with  control  value  space  U.  All  m(-)  will  be  of  this  prod¬ 
uct  form  for  some  relaxed  feedback  controls  =  1,2.  If  x{-)  is  a  solution 

to  (2.4),  and  m(-)  a  relaxed  feedback  control,  then  m(-)  can  be  represented  by  a 
relaxed  control  r(-)  with  derivative  rt{da)  =  ri^t{dai)r2,t{da2)  =  da). 

The  control  for  the  player  that  chooses  its  control  first  will  always  be  a 
relaxed  feedback  control,  but  that  for  the  player  who  chooses  its  control  last 
might  be  either  a  relaxed  feedback  control  or  a  relaxed  control  which  is  not 
representable  in  relaxed  feedback  form. 

Defining  the  controlled  dynamical  system  via  the  Girsanov  transfor¬ 
mation:  Relaxed  feedback  controls.  The  controlled  model  will  be  defined 
via  the  Girsanov  transformation  [17].  Some  of  the  well  known  details  will  be 
described,  since  the  equations  will  be  needed  for  the  approximations.  This  will 
be  done  first  for  the  relaxed  feedback  controls.  Let  =  1,2,  be  relaxed 

feedback  controls  and  define  m{x,da)  =  mi{x,dai)m2{x,da2).  Define 

/  bi{x,ai)mi{x,dai),  b{x,a)  =  bi{x,ai)  +  b2{x,a2), 

Ju\ 

and  set  bm{x)  =  Jub{x,a)m{x,da)  =  (a:)+^2, m2  (s;)- For  T  >  0  and  relaxed 

feedback  control  m(-),  define 

C{T,m)=  [  [(T“^(a:(s))6™(a:(s))]'(iw(s)  - [  \a~'^{x{s))bm{x{s))\^  ds, 

Jo  ^  Jo 

and  set 

For  each  {x,T,m{-)),  define  the  measure  on  {flTt^T)  via  the  Radon- 

Nikodym  derivative  R{T,m): 

=  R(T,  m)dP^^T-  (2.2) 

For  each  {x,m{-)),  the  family  of  measures,  indexed  by  T,  is  consistent  and 
can  be  extended  uniquely  to  a  measure  P™  on  (D,P)  that  is  consistent  with 
the  P^j’.  When  there  is  no  control  (i.e.,  where  the  system  is  (2.1)),  we  omit  the 
superscript  m.  The  process  Wm{-)  defined  by 

dwra{t)  =  dw{t)  —  [(T“^(a:(s))6TO(a;(s))]  dt  (2.3) 

is  an  PfStandard  Wiener  process  on  (D,P™,P)  [17].  Now,  rewrite  the  uncon¬ 
trolled  model  (2.1)  as 

dx{t)  =  bjn{x{t))dt  +  a{x{t))dwm{t)  +  dz{t).  (2.4) 

Under  the  measures  {P™,  x  G  G},  (2.4)  is  a  Markov  process  and  we  use  P'^{x,  t,  •) 
for  its  transition  function.  Use  P{x,t,-)  for  the  transition  function  of  the  un¬ 
controlled  process  (2.1).  Strictly  speaking,  the  process  Wm{-)  should  be  indexed 
also  by  the  initial  condition  x  =  a:(0),  but  we  omit  it  for  notational  simplicity. 
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The  controlled  dynamical  system  with  relaxed  controls.  Let  ni-)  be 

a  relaxed  control  for  player  i,  with  derivative  and  define  = 

jjj,  bi{x,a)ri  t{dai).  We  will  also  have  occasion  to  use  relaxed  (and  not  neces¬ 
sarily  relaxed  feedback)  controls  for  one  of  the  players.  For  specificity  at  this 
point,  suppose  that  a  relaxed  control  is  used  for  player  1  and  a  relaxed  feedback 
control  is  used  for  player  2.  Write  =  bi^ri{x,t)  -b  62, m2 (2;),  define 

^(T,  n,  m2),  and  u’ri,m2(’)  analogously  to  what  was  done  for 

the  pure  relaxed  feedback  control  case,  and  rewrite  the  controlled  equation  as 

dx{t)  =  bi^ri{x{t),t)dt  +  b2^m2{x)dt  +  a{x{t))dWri,m2it)  +  dz{t).  (2.5) 

The  measures  are  used  with  (2.5).  The  development  is  analogous  if 

player  1  uses  the  relaxed  feedback  control  and  player  2  the  relaxed  control. 

Representation  of  the  reflection  process  z(-).  For  either  the  model  (2.4) 
or  (2.5),  the  process  z(-)  can  be  represented  as 

z{t)  ='^yi{t)di,  (2.6) 

i 

where  yi{-)  is  nondecreasing,  right  continuous,  increases  only  at  t  where  x(t) 
is  on  the  z-th  face  of  G  and  satisfies  yi{0)  =  0.  Under  (A2.1),  (A2.2),  and 
(A2.4),  the  representation  (2.6)  is  unique  with  probability  one  [21,  Theorem 
3.6,  Chapter  4].  Let  denote  an  e-neighborhood  of  the  boundary  set  where 
more  than  one  constraint  is  active.  Then,  the  same  theorem  implies  that,  for 
t  >  0,  sup,^^„P™  [?/(<) |/{2:P)6m=}  ^  0  as  e^  0. 

2.2  Background  Results  and  the  Cost  Function 

The  development  depends  heavily  on  approximation,  continuity,  and  limit  re¬ 
sults  from  [21,  Chapter  4]  for  the  control  problem.  The  results  carry  over  to  the 
game  problem,  since  they  are  concerned  with  arbitrary  relaxed  feedback  and 
relaxed  controls.  To  facilitate  our  development,  several  key  results  from  [21] 
will  be  stated,  in  the  notation  of  this  paper. 

Illustration  of  the  use  of  the  Girsanov  transformation:  Mutual  ab¬ 
solute  continuity  of  the  transition  functions.  The  following  theorem  is 
[21,  Theorem  3.1,  Chapter  4].  We  will  outline  the  proof  by  copying  some  of 
the  details  from  the  reference,  since  similar  “Girsanov  transformation”  methods 
underlie  many  of  the  results,  there  are  some  slight  differences  worth  noting,  and 
it  gives  a  feeling  for  the  approach.  Unless  otherwise  noted,  “almost  all”  refers 
to  Lebesgue  measure.  The  symbol  ^  denotes  weak  convergence. 

Theorem  2.1.  Assume  (A2.1)-(A2.4).  Let  m'^{y,-)  m{y,-)  for  almost 

all  y  €  G,  where  to(-)  and  m"(-)  are  relaxed  feedback  controls.  Then  for  any 
0  <  to  <  ti  <  00  and  bounded  and  measurable  real-valued  function  /(•), 

J  f{y)P'^\x,t,dy)  ^  J  f{y)P^{x,t,dy)  (2.7) 
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uniformly  for  {x,f)  G  G  x  [to,ti].  For  any  t  >  0,  )  is  absolutely 

continuous  with  respect  to  Lehesgue  measure,  uniformly  in  m(-)  and  in  {x,t)  G 
G  X  [tQ,ti].  For  each  relaxed  feedback  control  m{-),  the  process  defined  by  (2.4) 
is  a  strong  Feller  process  and  it  has  a  unique  weak-sense  solution  for  each  initial 
condition  x. 


P  roof.  We  concentrate  on  the  uniformity  in  x  of  the  convergence  (2.7).  First 
note  that,  by  the  weak  convergence  and  the  product  form  of  m"(-),  the  limit 
m(-)  can  always  be  represented  as  m{x,da)  =  mi{x,dai)m2{x,da2)  for  some 
relaxed  feedback  controls  mi{-),i  =  1,2,  for  almost  all  x.  The  expression  (2.7) 
can  be  written  equivalently  as 

Exf{x{t))R{t,  m")  -  E„f{x{t))R{t,  m)  0.  (2.8) 


For  notational  simplicity,  let  cr{x)  =  I,  the  identity.  We  will  use  the  inequalities: 

|e“  —  e^l  <  |a  —  6|  |e“  +  e^|  ,  (2.9a) 


Ft 


b'^{x{s))dw{s)  -  /  b'^r,  {x{s))dw{s) 

t 

<  E,,  [  |6m(a:(s))  -  (a:(s))|^  ds. 


(2.9b) 


JO 

By  the  continuity  and  boundedness  of  b{-)  and  the  weak  convergence  of  the 
•)  for  almost  all  y  G  G,  we  have 


bm'^{y)=  /  h{y,a)m'^{y,da) ->  bm{y)  =  /  b{y ,  a)m{y ,  da) 

Ju  Ju 

for  almost  all  y.  Define 

bn{y)  =  \bm{y)  -  fern"  {y)\^  ■ 

Let  t  G  [<o,  ti],  where  0  <  to  <  fr  <  oo.  By  Egoroff’s  theorem  [11,  Theorem  12, 
page  149],  for  each  e  >  0,  there  is  a  measurable  set  A,,  with  t(Ae)  <  e  such  that 
bniy)  0  uniformly  in  y  ^  A,,.  Furthermore,  P{x,t,  ■)  is  absolutely  continuous 
with  respect  to  Lebesgue  measure  for  each  x  and  <  >  0  (and  uniformly  in 
{x,t)  G  G  X  [to,ti]  for  any  0  <  <o  <  fi  <  oo).  These  facts  imply  that 

E,,bn{x{s))ds  0, 

uniformly  in  x  G  G.  The  last  expression,  together  with  the  inequalities  (2.9), 
implies  (2.8)  uniformly  in  x  G  G. 


Additional  background  results.  We  will  also  need  the  results  of  Theorems 
2.2  to  2.8,  most  of  which  are  either  taken  from  [21]  or  are  minor  adaptations  of 
such  results.  Where  an  elaboration  on  a  proof  in  [21]  would  be  useful,  additional 


comments  will  be  made.  Although  the  reference  does  not  deal  with  games,  the 
fact  that  the  product  m{x,da)  =  mi{x,  dai)m2{x,  da2)  is  a  relaxed  feedback 
control  allows  the  results  to  be  carried  over. 

Theorem  2.2.  (From  [21,  Theorems  3. 1-3. 3,  Chapter  4].)  Assume  (A2.1)- 
(A2.4).  The  process  x(-)  defined  by  (2.4)  has  a  unique  invariant  measure  Hm{‘) 
for  each  relaxed  feedback  control  m{x,da)  =  mi(x,dai)m2{x,da2)-  Further¬ 
more  the  transition  function  P'^{x,t,  )  is  mutually  absolutely  continuous  with 
respect  to  Lebesgue  measure,  uniformly  in  m{-),x  G  G,  and  t  €  [toTi]  for  any 
0  <  to  <  ti  <  oo. 

A  smoothed  control.  Extend  the  dehnition  of  the  relaxed  feedback  control 
mi{y,  •)  so  that  it  is  defined  as  a  relaxed  feedback  control  for  all  y  G  FT .  For 
example,  let  it  be  concentrated  on  some  hxed  number  in  U  for  y  ^  G.  For  small 
e  >  0  and  x  G  G,  define  the  smoothed  control 

^  f  x  G  G. 

{2TTey/^  JjRr 

Dehne  me{x,  •)  =  mi^e{x,  ■)m2,t{x,  •). 

Theorem  2.3.  (This  is  [21,  Theorem  3.4,  Chapter  4].)  Assume  (A2.1)-(A2.4). 
me(-)  is  a  relaxed  feedback  control  and  m^(x,  •)  m{x,  •)  =  mi{x,  ■)m2{x,  ■)  for 
almost  all  x  G  G.  The  function  5™,  (•)  is  continuous  for  each  e,  and  5^=  (x) 
bm{x)  almost  everywhere  in  G. 

Theorem  2.4.  (From  [21,  Theorem  4.2,  Chapter  4].)  Assume  (A2.1)-(A2.4). 
Then  Hm{‘)  is  continuous  in  the  control  in  that  ifm'^{x,-)  m{x,-)  for  almost 

all  x  G  G,  then  for  each  Borel  set  A  C  G, 

T'mV  (A)  ^ 


The  cost  function.  We  will  need  the  following  assumption. 

A  2.5.  The  real-valued  functions  kif)  on  G  x  Ui,i  =  1,2,  are  continuous,  and  c 
is  a  vector  with  nonnegative  components. 

Define  k{x,a)  =  ki{x,ai)  +  k2{x,a2).  For  a  relaxed  feedback  control  m(-), 
define  km{x)  =  Jijk{x,a)m{x,da)  and 

-fT{x,m)  =  km{x{s))ds+  ^E'ffc'y{T). 

For  relaxed  feedback  controls,  the  cost  function  of  interest  in  this  paper  is 

7(m)  =  lim7T(a;,  m).  (2.10) 
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We  omit  the  x  =  a:(0)  from  the  argument  of  j{m),  since  it  will  not  depend  on 
the  initial  condition  under  our  assumptions  (see  Theorem  2.5).  If  player  i  uses 
a  relaxed  control  ri(-),  then  define 

kri{x,t)=  /  ki{x,ai)ri^t{dai). 

Ju\ 

If  player  1  selects  its  control  first  and  uses  a  relaxed  feedback  control  and  player 
2  selects  its  control  last  and  uses  a  relaxed  control,  then  define  (the  use  of  lim  inf 
is  just  a  convention): 

7T(a;,mi,r2)  =  J  [ki^mA^is))  +  k2,r2{x{s),s)]ds  + 

-f{x,ini,r2)  =  linMnf  7T(a:,  mi,  r2), 

If  player  2  selects  its  control  first  and  uses  a  relaxed  feedback  control  and  player 
1  uses  a  relaxed  control,  define  (the  use  of  lim  sup  is  just  a  convention): 

-f{x,ri,m2)  =  limsup7T(a:,  ri,  m2). 

T 

Representation  of  the  cost  in  terms  of  a  stationary  system.  Let  m(  ) 

be  a  relaxed  feedback  control.  The  system  (2.4)  starts  with  an  arbitrary  initial 
condition  that  does  not  necessarily  have  the  stationary  distribution.  It  turns 
out  that  the  limit  (2.10)  is  the  same  as  if  the  initial  condition  were  distributed 
as  /im(-).  This  is  the  assertion  of  the  next  theorem. 

Theorem  2.5.  (This  is  [21,  Theorem  4.1,  Chapter  4].)  Assume  (A2.1)- 
(A2.5).  Let  m(-)  be  a  relaxed  feedback  control.  Then  the  Effyi{\)  are  continuous 
functions  of  x  and 

lim7j’(a;,m)  =  '^(m) 

=  J  kmix)^im{dx)  +  j  Eff[c'y{l)]  Hm{dx). 

2.3  Existence  of  Optimal  Controisfor  the  Upper  and  Lower 
Values 

Define  the  upper  and  lower  values,  resp.,  for  the  game  (fb  denotes  relaxed  feed¬ 
back,  and  rel  denotes  relaxed  controls) 

7+=  inf  sup  l{mi,r2),  (2.11a) 

relaxed  fb  mi  j-el  controls  r2 

7“=  sup  inf  7(ri,m2).  (2.11b) 

relaxed  fb  m2  ''^1  controls  ri 
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It  is  shown  below  that  the  use  of  relaxed  controls  for  the  player  selecting  last 
offers  no  advantage  over  feedback  controls.  In  Section  4  it  is  shown  that  the 
game  has  a  value  in  that  7"*"  =  7“  =  7.  Then  the  numerical  procedure  converges 
to  7  as  the  discretization  level  goes  to  zero  (see  Section  3). 

The  dehnition  (2.11a)  is  interpreted  to  mean  that  player  2  supposes  that 
player  1  has  selected  a  relaxed  feedback  control  for  itself,  which  will  be  fixed 
throughout  the  game.  [I.e.,  player  1  selects  hrst.]  Given  this  presumed  choice 
of  player  1,  player  2  can  select  any  relaxed  or  relaxed  feedback  control  and  will 
choose  so  as  to  maximize.  This  maximizing  control  will  exist  and  will  actually 
be  of  the  relaxed  feedback  control  form  (implied  by  Theorem  2.8).  It  will  depend 
on  the  presumed  choice  of  player  1.  Given  this  relationship,  player  1  will  select  a 
minimizing  control.  By  Theorem  2.8,  it  will  exist  and  be  of  the  relaxed  feedback 
form.  The  interpretation  of  (2.11b)  is  analogous. 

Theorem  2.6.  (This  is  [21,  Theorem  4.3,  Chapter  4],  adapted  to  the  notation 
of  the  present  case.)  Assume  (A2.1)~(A2.5).  For  a  sequence  {to"(-)}  of  relaxed 
feedback  controls,  let  m”(a;,  •)  converge  weakly  to  m(x,-)  for  almost  all  x  G  G. 
Then  7(m”)  ^  7(m). 

For  fixed  mi(-),  maximize  over  TO2(-),  and  let  be  a  maximizing  se¬ 

quence.  Consider  measures  over  the  Borel  sets  of  G  x  U  which  are  defined  by 

m^(x,  da)dx  =  mi{x,  dai)m2{x,  da2)dx  (2-12) 

and  take  a  weakly  convergent  subsequence.  The  limit  can  be  factored  into  the 
form 

mi{x,dai)m2{x,da2)dx,  (2-13) 

where  fh2{  )  is  a  relaxed  feedback  control  for  player  2.  Since  TO2(-)  depends  on 
mi(-),  write  it  as  TO2(-)  =  fh2{-',mi).  Then,  given  mi(-),  the  relaxed  feedback 
control  m2{-]  mi)  is  maximizing  for  player  2  in  that 

sup7(mi,  m2)  =  7(mi,  m2(mi)) 

1712 

The  analogous  result  holds  in  the  other  direction,  where  player  2  chooses  first. 


Remark  on  the  proof.  First,  note  that  owing  to  the  product  form  any  weak 
sense  limit  of  the  sequence  defined  in  (2.12)  must  be  of  the  form  (2.13)  where 
mi(-)  is  a  relaxed  feedback  control.  The  reference  [21,  Theorem  4.3,  Chapter 
4]  is  concerned  with  a  minimization  problem.  Changing  minimization  to  max¬ 
imization  and  adapting  the  notation  to  our  case  where  there  are  two  controls 
and  one  is  hxed,  it  shows  that  the  limit  mi{x,  dai)rfi2{x,  da2)  is  maximizing, 
which  is  the  assertion  of  the  second  paragraph  of  the  theorem. 

Relaxed  controls  for  the  player  who  chooses  last.  Suppose  that  with 
mi(-)  fixed,  player  2  is  allowed  to  use  relaxed  controls  and  not  simply  relaxed 
feedback  controls.  The  following  theorem  says  that  the  maximization  over  this 
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larger  class  will  not  yield  a  better  result  for  player  2.  The  analog  of  the  result 
for  player  2  choosing  hrst  also  holds. 

Theorem  2.7.  (This  is  [21,  Theorem  6.1,  Chapter  4],  adapted  to  the  notation 
of  the  present  case.)  Assume  (A2.1)”(A2.5),  Fix  mi{-)  and  let  rn2{-',mi)  be  an 
optimal  relaxed  feedback  control  and  r2{-)  an  arbitrary  relaxed  control  for  player 
2.  Then  for  each  x  G  G, 


j{x,mi,r2)  <  7(mi,m2(mi)). 


Theorem  2.8.  Assume  (A2.1)-(A2.5).  Let  player  1  go  first.  Then  it  has 
an  optimal  control,  denoted  by  The  analogous  result  holds  if  player  2 

chooses  first,  and  its  optimal  control  is  denoted  by 

R  emark  on  the  proof.  The  proof  is  essentially  a  consequence  of  [21,  Theorem 
4.3,  Chapter  4],  just  as  Theorem  2.6  was.  Let  player  1  go  first  and  let 
be  a  minimizing  sequence  of  relaxed  feedback  controls.  By  Theorem  2.6,  if 
player  1  uses  m"(-)  then  player  2  would  use  the  (maximizing)  relaxed  feedback 
control  m2(-;  m").  Following  the  method  of  the  reference  that  was  used  to  prove 
Theorem  2.6,  take  a  weakly  convergent  subsequence  of  the  sequence  of  measures 
on  the  Borel  sets  of  G  x  U  that  is  defined  by  m'f{x,dax)m2{x,da2',rn'l)dx. 
and  denote  the  limit  by  rhf  {x,  dai)rh2{x,  da2)dx.  Any  weak  sense  limit  must 
have  this  form,  where  the  fhi  {■)  and  m2(-)  are  relaxed  feedback  controls.  For 
notational  simplicity,  let  n  index  the  weakly  convergent  subsequence.  Then, 
we  must  have  m'^{x,-)  and  m2(a;,-;m")  7712(0;,  •)  for  almost  all 

xGG. 

We  need  to  show  that  mf{-)  is  optimal  for  player  1  if  it  chooses  first,  and 
that  it  can  be  supposed  that  7Ti2(-)  =  TO2(-;mj*').  Since  is  minimiz¬ 

ing  for  player  1  when  it  chooses  Hrst,  7(777", 7772(777”))  ^  7+.  Suppose  that 
7^  <  sup,„2  ^2)-  Then  there  is  7772(-)  such  that  7“'"  <  7(777)'',  7772)-  Now, 

let  player  2  use  7772(-)  instead  of  m2{-;mf)  for  large  77.  Since  the  sequence  de¬ 
fined  by  mf{x,  dai)fh2{x,  da2)dx  converges  weakly  to  the  measure  defined  by 
rhi  {x,dai)rh2{x,da2)dx,  Theorem  2.6  implies  that  7(777",  7772)  ^  7(777)'’,  7772)  > 

.  This  contradicts  the  fact  that  {777"(-)}  is  minimizing,  since  it  implies  that 
there  is  e  >  0  such  that  7(777",  m2)  >  7’'’  +  e  for  large  77.  Thus  m)'' (•)  is  optimal 
for  player  1  if  it  chooses  first.  Since  =  7(777)'’,  7772),  without  loss  of  generality 
we  can  suppose  that  rh2{-)  =  7772(-;  to)'’). 

Remark  on  smooth  nearly  optimal  controls.  In  Section  4  we  will  need  the 
fact  that  the  optimal  relaxed  feedback  controls  for  either  player  can  be  smoothed 
with  little  loss.  In  particular,  suppose  that  player  1  chooses  first,  let  e  >  0,  and 
replace  to)'’  (•)  by  the  smoothed  Wifi-)  as  defined  above  Theorem  2.3.  It  is  true 
that 

lim  sup7(to)'’j,,  TO2)  =  7’'’.  (2.14) 

m2 
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To  prove  (2.14),  suppose  that  it  does  not  hold  in  that  there  is  ^  >  0  such  that 

lim  sup7(to)*"^,  m2)  >  7"'’ +  ^.  (2-15) 

m2  ’ 

Then  there  are  m2,e(-)  such  that  >  7'''  +  6/2  for  all  small  e  > 

0.  Let  e  index  a  weakly  convergent  subsequence  ofrni^{x,dai)m2,e{x,da2)dx. 
The  limit  can  be  written  as  mf  (x,  dai)rh2{x,  da2)dx  for  some  relaxed  feedback 
control  m2(-).  By  Theorem  2.6,  7(771)''^, m2, e)  ^  7(777)'', m2)  >  7''’  +  5/2,  a 
contradiction  to  the  optimality  of  m)^  (•)  for  player  1  if  it  chooses  first.  Obviously, 
there  is  an  analog  if  player  2  chooses  first. 

3  Convergence  of  the  Numerical  Procedure 

Discuss  the  connection. 

3.1  The  M  arkov  Chain  Approximation  M  ethod 

The  numerical  method  to  be  employed  is  the  Markov  chain  approximation 
method  of  [19,  20,  22].  The  approximating  processes  are  the  same.  But  the 
numerical  problem  to  be  solved  is  an  ergodic  cost  problem  for  a  Markov  chain. 
The  method  approximates  the  system  process  (2.4)  by  a  discrete  parameter  fi¬ 
nite  state  controlled  Markov  chain  that  is  “locally  consistent”  with  (2.4).  The 
cost  function  is  also  approximated  and  the  game  problem  is  then  solved.  Some 
basic  facts  from  [22]  concerning  the  procedure  will  now  be  stated.  Let  h  denote 
the  approximation  parameter.  Many  methods  for  getting  suitable  approximat¬ 
ing  chains  are  in  the  references  (e.g.,  see  [22,  Chapter  5]).  The  approximating 
chain  and  local  consistency  conditions  are  the  same  for  the  game  problems  of 
this  paper.  In  the  present  case,  where  a{x)a'{x)  is  uniformly  positive  definite, 
for  each  small  fixed  value  of  h  the  constructed  chains  can  be  selected  to  be 
ergodic  for  each  control  [22,  Chapter  7])  and  this  will  be  assumed  to  be  the 
case.  In  fact,  the  chains  can  be  chosen  such  that  for  each  small  h,  the  rate  of 
convergence  of  the  transition  functions  to  the  invariant  measure  (as  time  goes 
to  infinity)  will  be  uniform  in  the  control.  See  [22,  Chapter  7]  for  a  discussion 
of  the  setup  and  convergence  for  the  pure  control  problem. 

To  construct  the  approximation,  one  first  defines  Sh,  a  discretization  of  FT" . 
For  example,  Sh  might  be  a  regular  5— grid.  The  precise  requirements  are  quite 
weak  and  it  is  only  the  points  in  G  and  their  immediate  neighbors  that  are  of 
interest.  The  state  space  for  the  chain  is  divided  into  two  parts.  The  first  part 
is  Gh  =  G  n  Sh,  on  which  the  chain  approximates  the  diffusion  part  of  (2.4).  If 
the  chain  tries  to  leave  Gh,  then  it  is  returned  immediately,  consistently  with 
the  local  reflection  direction.  Thus,  define  dG//  to  be  the  set  of  points  not  in 
Gh  to  which  the  chain  might  move  in  one  step  from  some  point  in  Gh-  The  set 
OG//  is  an  approximation  to  the  reflecting  boundary.  The  use  of  dG//  simplifies 
the  analysis  and  allows  us  to  get  a  reflection  process  z^{-)  that  is  analogous  to 
z(.). 
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Local  consistency  on  Gh.  Let  denote  the  controls  used 

at  step  n  for  the  approximating  chain  Let  (respectively,  covar^’“) 

denote  the  expectation  (respectively,  the  covariance)  given  all  of  the  data  to 
step  n,  when  =  x,u^  =  a.  Then  the  chain  satishes  the  following  consistency 
condition.  There  is  At^{x,a)  =  At^  0  (it  does  not  depend  on  {x,a)  for 
X  G  G)  such  that 

E^::  [C+i  -  ^]  =  + o{At'^), 

covar^;“  -  x]  =  a{x)At’^  +  o{At’^),  a{x)  =  a{x)a'{x),  (3.1) 

Un+i-a\<Kih, 

for  some  real  Ki.  The  o(At^)  terms  are  uniform  in  {x,  a).  Let  P^{x,  y\ai,  a2)  = 
P^{x,y\a)  denote  the  one-step  transition  probabilities.  With  the  methods  in 
[22],  At^  is  obtained  automatically  as  a  byproduct  of  getting  the  P^{x,y\a), 
and  it  is  used  as  an  interpolation  interval.  More  generally,  At^  can  depend  on 
x,  a.  But  for  theoretical  purposes  for  the  ergodic  cost  problem,  the  problem  is 
rescaled  to  get  constant  intervals.  See  the  discussion  in  [22,  Chapter  7].  By 
(3.1),  in  G  the  conditional  mean  first  two  moments  of  are  close  to 

those  of  the  differences  of  the  solution  to  (2.4). 

The  first  two  lines  of  (3.1)  give  the  conditional  moments  for  any  fixed  control 
values  a  =  (ai,a2)-  Suppose  that  the  control  is  chosen  at  random,  depending 
only  on  the  current  state  (i.e.,  it  is  randomized  feedback).  Let  mJl{x,  dai)  denote 
the  associated  probability,  conditioned  on  the  past  and  on  the  current  state 
value  x,  and  define  m^{x,da)  =  mi{x,dai)m2{x,da2)-  Then  the  transition 
probability  is 

/  P^{x,  y\ai,a2)'mi{x,  dai)m2{x,  da2)- 

JU 

The  hrst  two  lines  of  (3.1)  are  now  replaced  by 

E^:f  -x]=  (x)At"  +  o{At% 

covar^’™  [^n+i  —  a;]  =  a{x)At^  +  o(A<^),  a(x)  =  a(x)a'(x). 

Thus,  the  forms  are  the  same  as  if  relaxed  feedback  controls  were  used.  Although 
the  actual  sample  paths  would  differ,  the  transition  probabilities  are  the  same 
for  the  randomized  and  the  relaxed  feedback  forms. 


Local  consistency  on  dG^  .  From  points  in  dG'^,  the  transitions  of  the  chain 
are  such  that  they  move  to  Gh,  with  the  conditional  mean  direction  being  a 
reflection  direction  at  x.  More  precisely. 


lim 

h^O 


sup 

xeaG+ 


distance(a:,  Gh)  =  0, 


(3.3) 


and  there  are  9i  >  0  and  02 (^)  ^  0  as  /i  ^  0  such  that  for  all  x  G  dG^, 

E^:n  -  a:]  e  {ay  :  7  G  d{x),02ih)  >a>  0ih}  , 

At^{x,  a)  =  0  for  a:  G  dG^. 


(3.4) 
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The  last  line  of  (3.4)  says  that  the  reflection  from  states  on  dG^  is  instantaneous. 
Without  loss  of  generality,  we  can  suppose  that  the  transition  probabilities  are 
continuous  in  the  control  variables  for  each  x  (see  [22,  Chapter  5]  for  typical 
methods  of  construction). 

C  onti  nuous  ti  me  i  nterpolat  ion.  Only  the  discrete  time  chain  is  needed  for 
the  numerical  computations.  But,  for  the  proofs  of  convergence,  the  chain  must 
be  interpolated  into  a  continuous  time  process  which  approximates  a;(-).  The 
interpolation  intervals  are  suggested  by  the  At^(-)  in  (3.1)  and  (3.4).  We  will 
use  a  Markovian  interpolation,  called  V'^(’)-  Let  {Ar,^,n  <  oo}  be  conditionally 
mutually  independent  and  “exponential”  random  variables  in  that 

{At)^  >t)  =  ("’“). 

Note  that  At))  =  0  if  is  on  the  reflecting  boundary  dG^.  Define  Tq  =  0,  and 
for  n  >  0,  set  The  rj)  will  be  the  jump  times  of  Now 

define  ^/'^(•)  and  the  interpolated  reflection  processes  by 

^\+l  — 

z\t)=  Y 

^i+1— ^ 

z\t)=  Y 

Define  the  continuous  time  interpolations  «(*(•)  of  the  controls  analogously.  Let 
r^{-)  denote  the  relaxed  control  representation  of  rtf  (•)■  The  process  V'^(’)  is  a 
continuous  time  Markov  chain.  When  the  state  is  x  and  control  pair  is  a,  the 
jump  rate  out  of  x  G  Gh  is  1/At^(a:,  a).  So  the  conditional  mean  interpolation 
interval  is  M^{x,a);  i.e.,  P(l;“[r,^+i  -  t,^]  =  At^{x,a). 

Define  z^{-)  by  Z^{t)  =  z^{t)  +  z^{t).  This  representation  splits  the  effects 
of  the  reflection  into  two  parts.  The  first  is  composed  of  the  “conditional  mean” 
parts  and  the  second  is  composed  of  the  perturbations 

about  these  conditional  means  [22,  Section  5.7.9].  Both  components  can  change 
only  at  t  where  V'^(^)  can  leave  Gh-  Suppose  that  at  some  time  t,  Z^{t)  — 
Z^{t—)  ^  0,  with  =  X  €  Gh-  Then  by  (3.4),  z'^(t)  —  z^{t—)  points  in 

a  direction  in  d{Nh{x))  where  Nh{x)  is  a  neighborhood  with  radius  that  goes 
to  zero  as  h  ^  0.  The  process  z^{-)  is  the  “error”  due  to  the  centering  of 
the  increments  of  the  reflection  term  about  their  conditional  means  and  has 
bounded  (uniformly  in  x,  h)  second  moments  and  it  converges  to  zero,  as  will 
be  seen  in  Theorem  3.1.  By  (A2.1),  (A2.2),  and  the  local  consistency  condition 
(3.4),  we  can  write  (modulo  an  asympotically  negligible  term) 

i 
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where  (0)  =  0,  and  j/f  (•)  is  nondecreasing  and  can  increase  only  when 
is  arbitrarily  close  (as  ft-  ^  0)  to  the  ith  face  of  dG. 


A  representation  for  The  process  has  a  representation  which 

resembles  (2.4),  and  is  useful  in  the  convergence  proofs.  Let  =  x.  By  [22, 
Sections  5.7.3  and  10.4.1],  we  can  write 


b{^p^{s),  M^(s))ds 


f  a{i,^{s))dw^{s)  +  Z^{s)  +  e'*(s), 
Jo 


(3.5) 


where  V'^(f)  G  G.  The  process  e^(-)  is  due  to  the  o(-)  terms  in  (3.1)  and  is  asymp¬ 
totically  unimportant  in  that,  for  any  T,  lim/j  sup^,  „h  supj,<y  eg-  \eHs)\^  =  0. 
The  process  w^{-)  is  a  martingale  with  respect  to  the  filtration  induced  by 
{■) ,  {■) ,  w^{-  j),  and  converges  weakly  to  a  standard  (vector-valued)  Wiener 

process.  The  w'^{t)  is  obtained  from  {i/'^(s),  s  <  t}.  All  of  the  processes  in  (3.5) 
are  constant  on  the  intervals 

Let  |z^|(T)  denote  the  variation  of  the  process  z^{-)  on  the  time  interval 
[0,T].  Then  we  have  the  following  theorem  from  [22]. 


Theorem  3.1.  (Theorem  11.1.3  and  (5.7.5)] [22].)  Assume  (A2.1),  (A2.2),  the 
local  consistency  conditions,  and  let  ft(-)  and  a(-)  be  bounded  and  measurable. 
Then  for  any  T  <  oo,  there  are  K2  <  00  and  Sh,  where  6h  ^  0  as  h  ^  0,  and 
which  do  not  depend  on  the  controls  or  initial  condition,  such  that 

E\z’^\''{T)<K2,  (3.6) 

E  snp \z’^{s)\^  =  6hE\z’^\{T).  (3.7) 

s<T 

Owing  to  the  fact  that  the  reflection  directions  at  any  corner  or  edge  are  linearly 
independent,  the  inequalities  hold  for  y^{-)  replacing  z^{-). 


The  cost  function  and  upper  and  lower  values  for  the  discrete  game. 

Relaxed  feedback  controls,  when  applied  to  the  Markov  chain,  are  equivalent 
to  randomized  controls.  Let  u^{-)  =  (ui  (•)>  ^2 (’))  be  feedback  controls  for  the 
approximating  chain.  Then  the  cost  is 


7^(x,n'‘)  =  7^(x,4,4)  =  ^EG-"  K.{^f\s))ds  +  EG- 


T  ' 


7”(m”)  =  limy  7y(a:,  u"). 


(3.8) 

Now  suppose  that  m^(-)  represents  a  randomized  control  (as  discussed  above 
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(3.2)).  Then  the  cost  function  can  be  written  as 


7y(a;,  nil, 


)  = 


krn^  {'lp'^{s))ds  + 


7^(m^)  =  limT  7T(a^j 


c'y\T) 

T 


(3.9) 

With  the  relaxed  feedback  control  representation  of  an  ordinary  feedback  con¬ 
trol,  (3.8)  is  a  special  case  of  (3.9).  Also,  we  can  always  take  the  controls  in 
(3.9)  to  be  randomized  feedback. 

Suppose  that  player  1  chooses  its  control  first  and  uses  the  relaxed  feedback 
(or  randomized  feedback)  control  mj'(-).  Then  player  2  has  a  maximization 
problem  for  a  finite  state  Markov  chain.  The  approximating  chain  is  ergodic  for 
any  feedback  control,  whether  randomized  or  not.  Then,  since  the  transition 
probabilities  and  cost  rates  are  continuous  in  the  control  of  the  second  player, 
the  optimal  control  of  the  second  player  exists  and  is  a  pure  feedback  control 
(not  randomized)  [8,  volume  2],  [25].  The  cost  does  not  depend  on  the  initial 
condition.  The  analogous  situation  holds  if  player  2  chooses  its  control  first. 
These  facts  will  be  used  in  the  next  theorem.  We  use  m(*(-)  to  denote  either  a 
randomized  feedback,  relaxed  feedback,  or  the  relaxed  feedback  representation 
of  an  ordinary  feedback  control.  Define  the  upper  and  lower  values,  resp.: 


7+’'*  =  infsup7^(mi,m2), 

m*}  rnh 


7  ’^  =  supinf  7^(mi ,  m^). 

m!^ 

Under  our  hypotheses,  the  upper  and  lower  values  might  be  different,  although 
Theorem  3.2  says  that  they  converge  to  the  same  value  asympotically.  If  the 
dynamics  are  separated  in  the  sense  that  P^{x,  y\a)  can  be  written  as  a  function 
of  {x,  y,  ai)  plus  a  function  of  (x,  y,  02),  then  7+’^  =  7“’^.  [The  proof  is  similar 
to  that  giving  the  analogous  result  in  Section  4,  except  that  the  state  space  is 
discrete  here.]  One  can  choose  the  transition  probability  so  that  it  is  separated, 
if  desired. 


3.2  Convergence  of  the  N umerical  Procedure 


Theorem  3.2.  Assume  (A2.1)-(A2.5)  and  suppose  that^ 

7+  =  7“  =  7. 


Then 

Hence 


<  liminf  7 

h 


<  limsup7''’’^  <  7''’. 


Iim7'*'’^  =  lim7 

_ h  h 

^Equation  (3.10)  will  be  proved  in  the  next  section 


(3.10) 

(3.11) 

(3.12) 
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and  both  the  upper  and  lower  values  for  the  numerical  approximation  converge 
to  the  value  for  the  original  game. 


Proof.  Let  player  1  choose  its  control  first  and  let  e  >  0.  Let  be  an 

e-smoothing  of  the  optimal  control  (•)  for  player  1,  when  it  chooses  first,  as 
discussed  at  the  end  of  Section  2.  That  discussion  implies  that,  given  5  >  0, 
there  is  e  >  0  such  that  ^{■)  is  (5-optimal  for  player  1  for  the  original  problem. 
Now,  let  player  1  use  rh'^ ^{■)  on  the  approximating  chain,  either  as  a  randomized 
feedback  or  a  relaxed  feedback  control.  Given  that  player  1  chooses  first  and 
uses  we  have  a  simple  control  problem  for  player  2.  As  noted  above, 

the  optimal  control  for  player  2  exists  and  is  pure  feedback,  and  we  denote  it 
by  U2{-),  with  relaxed  feedback  control  representation  m2(-)- 
By  the  definition  of  the  upper  value. 


7+-'*  <  sup7'*(m+,,4)  =  sup7'*(m+^,m^)  =  7'*(w+„u^), 


(3.13) 


where  denotes  an  arbitrary  ordinary  feedback  control,  and  rnffi  )  an  ar¬ 
bitrary  randomized  feedback  control.  The  maximum  value  of  the 

control  problem  for  player  2  with  player  I’s  control  fixed  at  does  not 

depend  on  the  initial  condition.  Hence,  without  loss  of  generality,  the  cor¬ 
responding  continuous  time  interpolation  can  be  considered  to  be  sta¬ 

tionary.  Then,  using  the  continuity  in  {x,a2)  of  J^^b{x,a)rn^^(x,dai)  and 
of  k{x,a)Wii  ^{x,dai)  (and  replacing  the  minimization  problem  by  a  maxi¬ 
mization  problem),  yields  [22,  Theorem  3.1,  Chapter  11]  that  there  is  a  relaxed 
control  f2{-)  for  the  original  problem  such  that:^ 

limsupy'*"’^  <  limsup7^(m]j''j, U2)  =  j(m^^,f2)  <  +  6.  (3-14) 

h  h  '  ’ 


The  last  inequality  of  (3.14)  follows  from  Theorem  2.7  and  the  (5-optimality  of 
m]j*"g(-)  in  the  class  of  relaxed  feedback  controls  for  player  1  if  it  chooses  first. 

Now,  let  player  2  choose  first,  Then  there  is  an  analogous  result  with  analo¬ 
gous  notation:  In  particular,  given  5  >  0,  there  is  an  e  >  0  and  an  e— smoothing 
g  (•)  of  the  optimal  control,  and  a  relaxed  control  f  1  (•)  for  the  original  problem 
(2.4)  such  that 

liminfy”’'*  >  liminf7'‘(u^,m^J  =  7(f2,m^J  >  7“  -  A  .  (3.15) 

h  h  ’  ’  ^  ' 

Hence,  since  6  is  arbitrary,  (3.11)  holds.  This,  with  (3.10),  yields  the  theorem. 


^In  [22,  Theorem  3.1,  Chapter  11],  the  symbol  m(-)  is  used  for  a  relaxed  control  and  not 
a  relaxed  feedback  control.  That  reference  does  not  use  relaxed  feedback  controls. 
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4  Existence  of  the  Value  of  the  Game 


An  approach  to  the  proof.  The  existence  of  the  value,  namely  (3.10),  will 
be  proved  in  this  section.  Before  proceeding  with  the  proof,  we  will  motivate 
what  will  be  needed  by  outlining  a  tentative  approach.  The  outline  is  purely 
formal.  But,  later,  it  will  be  seen  that  the  method  can  be  carried  out. 

Suppose  for  the  moment  that  the  game  for  the  numerical  approximation  has 
a  value  in  that  7+’^  =  7“’^,  and  let  there  be  controls  controls 
for  the  numerical  method  (written  in  relaxed  feedback  form)  which  attain  the 
value,  no  matter  who  chooses  first.  I.e.,  m(*(-)  is  optimal  for  player  i  whether  it 
chooses  its  control  first  or  last.  Thus, 

Suppose  also  that  there  are  relaxed  feedback  controls  mi(-)  such  that,  for  some 
subsequence  of  /i  ^  0, 

m\{x,  dai)m2{x,  da2)dx  mi{x,  dai)m2{x,  da2)dx.  (4-2) 

Finally,  suppose  that  for  any  sequence  (indexed  by  /i  ^  0)  of  relaxed  feedback 
controls  {m(*(-)},i  =  1,2,  for  which  m\{x,dai)m2{x,da2)dx  converges  weakly 
to,  say,  mi(x,  dai)m2{x,  da2)dx,  we  have  the  convergence  of  the  costs 

m2)  7(mi,  m2).  (4.3) 

Then  by  (3.11)  it  follows  that 

7“  <  l{mi,fh2)  <  7"^- 

We  claim  that,  under  the  above  hypotheses,  the  limit  control  mi(-)  is  op¬ 
timal  for  player  i  if  it  chooses  first.  To  prove  this  claim  one  can  proceed  as 
follows.  Suppose  that  mi(-)  is  not  optimal  for  player  1  if  it  chooses  first, 
in  that  sup^^  7(mi,  m2)  >  y"*".  Then  there  are  5  >  0  and  m2{-)  such  that 
7(7711,7712)  >  7^  +  26.  Following  the  approach  in  Theorem  3.2,  for  e  >  0  let 
m2,t{-)  be  an  e-smoothing  of  m2{-).  Then,  for  small  e  >  0,  7(7711, 7772, e)  >  y"*"  -l-^. 
Then  apply  rh2,(.{')  to  the  approximating  controlled  process  7/;^(-)  to  get  a  con¬ 
tradiction  to  the  optimality  of  (mi  (•),  m^ (•))  for  small  h.  Such  a  contradiction 
implies  that  sup^^  7(^i)  ^2)  <  y"*"-  But,  the  strict  inequality  <  is  impossi¬ 
ble  due  to  the  definition  of  the  upper  value.  Hence  sup,„2  7(^1,  m2)  =  y’*’,  as 
desired. 

To  get  the  desired  contradiction  to  the  optimality  of  (mi  (•),  m2(-))  for  small 
h,  let  h  index  a  weakly  convergence  subsequence  of  the  measures  defined  in  the 
left  side  of  (4.2).  The  limit  must  be  of  the  form  on  the  right  side  of  (4.2)  for  some 
77ii(-),  i=  1,2,  where  fn^{x,-)  rhi{x,-)  for  almost  all  a;  G  G,7  =  1,2.  Apply 
the  control  pair  (mi  (•),  77i2,e(-))  fo  Then  (along  the  chosen  subsequence 

of  h) 

rn^{x,  dai)m2^e{x,  da2)dx  77ii(a;,  dai)m2,e{x,  da2)dx. 
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Since  (4.3)  implies  that  7^(m(*,m2,e)  ^  7(hti,m2,e),  for  small  enough  e  and 
h,  we  must  have  7^(m(*,m2,£)  >  7'*'’^  +  S/2,  which  is  a  contradiction  to  the 
optimality  of  We  can  now  conclude  that 

sup7(mi,  TO2)  =  7"*"  =  7(mi,  m2).  (4.4) 

m2 

Thus,  if  player  1  chooses  its  control  first  and  uses  its  optimal  control  mi(-),  then 
m2(-)  is  optimal  for  player  2.  By  repeating  the  procedure  with  the  order  of  the 
players  reversed,  we  can  finally  conclude  that,  if  (4.1)-(4.3)  hold  (at  least  for 
some  subsequence  of  h),  then  (3.10)  holds. 

The  approach  outlined  above  for  proving  (3.10)  is  attractive.  But  it  cannot 
work  for  the  class  of  processes  V'^(’)  which  are  used  for  the  actual  Markov  chain 
approximation  numerical  method  in  Section  3,  since  for  each  h,  the  state  space 
is  only  some  finite  set.  Hence,  the  controls  are  not  defined  for  all  a:  G  G,  and 
the  transition  function  is  not  mutually  absolutely  continuous  with  respect  to 
Lebesgue  measure.  However,  in  this  section  we  are  concerned  only  with  proving 

(3.10) ,  and  not  with  the  numerical  procedure.  Thus,  we  can  use  the  approach 
which  was  outlined  above  for  an  appropriately  chosen  alternative  approximating 
process  for  which  (3.11)  also  holds.  A  discrete  time  process  will  be  constructed 
for  which  (3.11)  and  (4.1)-(4.3)  hold.  This  process  is  to  be  used  solely  to  prove 

(3.10) .  It  is  not  suitable  for  numerical  solution.  For  future  use,  note  that  if 

the  mf  (•),  i  =  1,  2,  are  relaxed  feedback  controls  for  each  h  and  the  fn/{x,  •)  are 
defined  for  almost  all  x,  then  there  is  always  a  subsequence  and  relaxed  feedback 
controls  =  1,2,  for  which  (4.2)  holds. 

An  alternative  approximating  process.  To  get  the  approximating  process, 
time  will  be  discretized  but  not  space.  Let  A  >  0  denote  the  time  discretiza¬ 
tion  interval.  We  need  to  construct  process  whose  n-step  transition  functions 
P^{x,  nA,  •|a)  have  densities  that  are  mutually  absolutely  continuous  with  re¬ 
spect  to  Lebesgue  measure,  uniformly  in  (A,  control,  to  <  nA  <  ti)  for  any 

0  <  to  <  ^1  <  oo- 

Consider  the  following  procedure.  Start  with  the  process  (2.4),  but  with  the 
controls  held  constant  on  the  intervals  [tA,  lA  +  A),  t  =  0, 1, . . ..  The  discrete 
approximation  will  be  the  samples  at  times  lA,  I  =  0, 1, . . ..  The  controls  are 
chosen  at  t  =  0,  with  one  of  the  players  selected  to  choose  first,  just  as  for  the 
original  game.  Let  =  1,2,  denote  the  controls,  if  in  pure  feedback  (not 

relaxed  or  randomized)  form.  In  relaxed  control  notation  write  the  controls  as 
z  =  1,  2.  These  controls  are  used  henceforth,  whenever  control  is  applied. 
The  chosen  controls  are  applied  at  random  as  follows.  At  each  time,  only  one 
of  the  players  will  use  its  control.  At  each  time  I  A,  I  =  0, 1, . . . ,  flip  a  fair  coin. 
With  probability  1/2,  player  1  will  use  its  control  during  the  interval  [^A,  ^A-|-A) 
and  player  2  not.  Otherwise,  player  2  will  use  its  control,  and  player  1  not.  The 
values  of  the  controls  during  the  interval  will  depend  on  the  state  at  its  start. 
The  optimal  controls  will  be  feedback.  Define  x^{t)  =  x{lA)  on  [^A,  ^A-|-A).  For 
pure  (not  randomized  or  relaxed)  feedback  controls  =  1,2,  the  system 


20 


is 

dx  =  b^{x,u^{x^))dt  +  a{x)dw  +  dz,  (4.5a) 

where  the  value  of  6^(-)  is  determined  by  the  coin  tossing  randomization  proce¬ 
dure  at  the  times  I  A,  ?  =  0, 1 . . In  particular,  at  t  G  [^A,  lA+A),  b^{x,  m^{x^)) 
is  2bi{x{t),uf{x^{t))),  for  either  z  =  1  or  i  =  2  according  to  the  random  choice 
made  at  lA.  If  the  control  is  relaxed  feedback,  then  write  the  model  as 

dx  =  b^{x,  m^{x^))dt  +  a{x)dw  +  dz,  (4.5b) 


where  at  t  G  [^A,  ^A-|- A),  6^(a;,  m^(a;‘^))  is  2  bi{x{t),  ai)mf^{x{l A),  dai),  ior 
either  z  =  1  or  z  =  2  according  to  the  random  choice  made  at  lA.  Following  the 
Girsanov  transformation  based  usage  in  (2.4),  the  Wiener  process  iv{-)  should  be 
indexed  by  the  controls  u^{-)  or  to^(-),  but  we  omit  it  for  notational  simplicity. 

Let  denote  the  expectation  of  functionals  on  [lA,  ZA-I-A)  when  player 

z  acts  on  that  interval  and  uses  control  action  a*.  Let  Pf^{x,-\ai)  denote  the 
the  measure  of  x(A),  given  that  the  initial  condition  is  x,  player  i  acts  and  uses 
control  action  a*.  The  conditional  mean  increment  in  the  total  cost  function  on 
the  time  interval  [IA,IA  +  A)  is,  for  uf"{x{lA))  =  ai,i  =  1,2, 


C^{x{lA),a)  = 

1  pA.i.Oi 

2  2^  ^x(lA) 


i=l,2 


^;a+a 

//A 


2ki{x{s),  ai))ds  +  c'  {y{lA  -I-  A)  -  y{lA)) 


(4.6) 

Note  that  C^{x,a)  is  the  sum  of  two  terms,  one  depending  on  {x,ai)  and  the 
other  on  {x,a2)-  The  weak  sense  uniqueness  of  the  solution  to  (2.4)  for  any 
control  and  initial  condition  implies  the  following  result. 


Theorem  4.1.  Assume  (A2.1)-(A2.5).  Then  for  each  A  >  0,  C^{-)  is  con¬ 
tinuous  and  the  measures  are  weakly  continuous  in  that  for  any  bounded 

and  continuous  real-valued  function  f{-),  f  f(y)Pf‘(x,dyla)  and  C^(x,a)  are 
continuous  in  {x,a). 

The  reason  for  choosing  the  acting  controls  at  random  at  each  time  ^A,  I  = 
0, 1, . . . ,  is  that  the  randomization  “separates”  the  cost  rates  and  dynamics  in 
the  controls  for  the  two  players.  By  separation,  we  mean  that  both  the  cost 
function  and  transition  function  are  the  sum  of  two  terms,  one  depending  on 
{x,ai)  and  the  other  on  (x,a2)-  This  separation  is  important  since  it  gives  the 
“Isaacs  condition  ”  which  is  needed  to  assure  the  existence  of  a  value  for  the 
game  for  the  discrete  time  process,  as  seen  in  Theorem  4.2.  Proceeding  formally 
at  this  point,  let  /^(^a(’)  denote  the  invariant  measure  under  the  control  m^(-). 
Define  the  stationary  cost  increment 

□ 

A^(m^)  =  f  ^^&{dx)  f  C{x,a)m'^{x,da)  . 

Jg  Ju 

Note  that,  due  to  the  scaling,  A^(m^)  is  an  average  over  an  interval  of  length 
A:  hence  A^(m^)  =  A'y^{m^).  Suppose  for  the  moment  that  there  is  an 
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optimal  control  =  1,2,  for  each  A  >  0  and  define  A  =  A^(to^).  The 

“separation”  is  easily  seen  from  the  formal  Isaacs  equation  for  the  value  of  the 
discrete  time  problem,  namely, 

A"^  = 

infsup  \  j  g^{x  +  y)P^{x,dy\ai)  +  l-  [  g^{x  +  y)P^{x,dy\a2)  +  C^{x,a)  , 

0:2  J  ^  J 

(4.7) 

where  g^{  )  is  the  relative  value  or  potential  function. 

Theorem  4.2.  Assume  (A2.1)-(A2.5).  Then  (3.10)  holds. 

P  roof.  We  will  work  with  the  approximating  process  x{lA),l  =  0, 1, . . .  just 
described,  where  x{-)  is  defined  by  (4.5)  with  the  piecewise  constant  control,  and 
verify  the  conditions  imposed  in  the  formal  discussion  at  the  beginning  of  the 
section.  Results  from  [21]  will  be  exploited  whenever  possible.  The  result  (3.11) 
holds  (with  A  replacing  h)  for  the  same  reasons  that  it  holds  for  the  numerical 
approximating  process  of  the  last  section.  For  any  sequence  of  relaxed  controls 
=  4)2,  there  is  a  subsequence  (indexed  by  A)  and  =  1,2,  such 

that 

m^{x,  dai)m^{x,  da2)dx  rhi{x,  dai)rh2{x,  da2)dx. 

One  needs  to  show  the  analog  of  (4.3),  namely  (along  the  same  subsequence, 
indexed  by  A) 

7^(m^)  ^  7(m).  (4.8) 

The  process  {x(?A)}  based  on  (4.5)  inherits  the  crucial  properties  of  (2.4),  as 
developed  in  [21,  Chapter  4]  and  summarized  in  Subsection  2.2.  In  particular, 
for  each  positive  A  and  n  the  n— step  transition  probability  P^{x,nA,  -Im^) 
is  mutually  absolutely  continuous  with  respect  to  Lebesgue  measure,  uniformly 
in  the  control  and  in  a;  G  G,  nA  G  [to,fi],  for  any  0  <  to  <  G  <  oo,  and 
it  is  a  strong  Feller  process.  The  invariant  measures  are  mutually  absolutely 
continuous  with  respect  to  Lebesgue  measure,  again  uniformly  in  the  control. 
Then  the  proof  of  (4.8)  is  very  similar  to  the  corresponding  proof  for  (2.4)  given 
in  [21,  Theorem  4.3,  Chapter  4]  and  the  details  are  omitted.  There  are  controls 
mi  ’^(•)  which  are  optimal  if  player  1  chooses  its  control  first  (i.e.,  for  the  upper 
value),  and  m2  ’”(•)  which  are  optimal  if  player  2  chooses  its  control  first  (i.e., 
for  the  lower  value). 

We  will  concentrate  on  showing  the  analog  of  (4.1),  namely, 

7+-^  =7-’^.  (4.9) 

By  the  (uniform  in  the  controls)  mutual  absolute  continuity  of  the  one  step 
transition  probabilities  for  each  A  >  0,  the  process  satisfies  a  Doeblin  condition, 
uniformly  in  the  control.  Hence  it  is  uniformly  ergodic,  uniformly  in  the  control) 
[23,  Theorems  16.2.1  and  16.2.3].  In  particular  it  follows  that  there  are  constants 
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and  p^,  with  pA  <  1  such  that 

<  [pa]"  , 

where  A^(m^)  is  defined  above  (4.7). 

Define  the  relative  value  function 


sup 

'r  Tin  ^ 


E: 


'A,m^ 


I  C{x{nA),a)m'~ 

Ju 


{x{nA),da)  —  A^(m^) 


g^{x,m^)  =  ^  ^E^’"^‘^C{x{lA),m'^{x{nA))  - 
1=0 


The  summands  converge  to  zero  exponentially,  uniformly  in  {x,m^{-)).  Also, 
by  the  strong  Feller  property  the  summands  (for  I  >  0)  are  continuous.  Define 
^^’■'■(a;)  =  p^(a:,TO^’+)  and  g^~{x)  =  g^{x,m^’~).  Then,  a  direct  evaluation 
yields 

AA++gA.+  (^)  ^£,a.™^'+  [g^,+  ^x{A))  +  C^{x,m^’+{x))]  .  (4.10) 


Next  we  show  that  under  (and  for  almost  all  a;) 


A^’+  + 


p^’+(a:)=sup  ’“''p^’+(a;(A))  +  C^(a:,mf’’^(a;),Q;2) 

02 


-.A.i 


t^tA.+  / 


(4.11) 


By  (4.10),  (4.11)  holds  for  almost  all  x  with  the  equality  replaced  by  the  in¬ 
equality  <.  The  function  in  brackets  in  (4.11)  is  continuous  in  a2,  uniformly  in 
X  G  G.  Suppose  that  (4.11)  does  not  hold  on  a  set  A  C  G  of  Lebesgue  measure 
1{A)  >  0.  Let  rh^{-)  denote  the  (relaxed  feedback  control  representation  of  the) 
maximizing  control  in  (4.11).  Then 


A^’++p^’+(a:)  <  E: 


,A,mf 


g'^’+ixiA))  +  C^{x,mf’~^{x),mf{x)) 


A,+  , 


(4.12) 

with  strict  inequality  for  x  G  A.  Now,  integrate  both  sides  of  (4.12)  with 
respect  to  the  invariant  measure  corresponding  to  the  control 

(mf  (•), m^(-))  and  note  that 


J  g^’+{x)pf_..  +  ^^^^{dx)  =  J 

Also,  by  definition. 


""'^’™"5A’+(a:(A)) 


A 

A,  +  ~  A  ~i 

“{mj^  ,7712  } 


{dx). 

(4.13) 


A"^(mf’+,m^)  =  J  G"^(a;,mf’+(a:),TO^(a:))p^_A,  +  _^A}(rfa:). 


Then,  canceling  the  terms  in  (4.13)  from  the  integrated  inequality  and  using  the 
fact  that  the  invariant  measure  is  mutual^  absolutely  continuous  with  respect 
to  Lebesgue  measure  yields  A^’"*"  <  A^(mf m^),  which  contradicts  the  opti¬ 
mality  of  r7T^’''’(-)  for  player  2,  if  player  1  selects  its  control  first.  Thus,  (4.11) 
holds. 
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Next,  given  that  (4.11)  holds,  let  us  show  that  for  almost  all  x 


A^’++g^’+(a:)  =infsupS^>“i’“=^  rg^’+(a:(A))  +  (7^(0;,  Oi,  02)]  ,  (4.14) 

“1  a2 


By  (4.11),  this  last  equation  holds  if  m)^’'^(-)  replaces  ai  and  the  inf  is  dropped. 
Suppose  that  (4.14)  is  false.  Then  there  are  A  €  G  with  1(A)  >  0  and  e  >  0 
such  that  ioT  X  G  A  the  equality  is  replaced  by  the  inequality  >  plus  e,  with  the 
inequality  >  holding  for  almost  all  other  x  G  G.  More  particularly,  let  m)^’^(-) 
denote  the  minimizing  control  for  player  1  in  (4.14).  Then  we  have,  for  almost 
all  X  and  any  rn^{-), 


+  9^^'+ (x)  >  [g'^’+(x(Aj)  +  C'^{x,m^{x),m^(x))]  +eI[^^A}, 


(4.15) 

Now,  repeating  the  procedure  used  to  prove  (4.11),  integrate  both  sides  of  (4.15) 
with  respect  to  the  invariant  measure  associated  with  (m^ {■) ,  {■)) ,  use  the 
fact  that  the  invariant  measure  is  mutually  absolutely  continuous  with  respect 
to  Lebesgue  measure,  u  niformly  in  the  controls,  and  cancel  the  terms  which 
are  analogous  to  those  in  (4.13),  to  get  that 


>  sup  A^(m)^,  m^). 


This  implies  that  is  not  optimal  for  player  1  if  it  selects  its  control  first, 

a  contradiction.  Thus,  (4.14)  holds.  The  analogous  procedure  can  be  carried 
out  for  the  lower  value  where  player  2  selects  its  control  first.. 

Now  the  fact  that  the  dynamics  and  cost  rate  are  separated  in  the  control 
implies  that  inf^^^  sup^.^  =  sup^^  inf^j^  in  (4.14).  Thus,  (4.14)  holds  with  the 
order  of  the  sup  and  inf  inverted.  By  working  with  the  equation  (4.14)  with  the 
sup  and  inf  inverted  and  following  an  argument  similar  to  that  used  to  prove 
(4.14),  one  can  show  that  A^’+  =  A^’“  and  that  mf(-)  is  optimal  for  player  i 
whether  it  selects  first  or  last.  The  rest  of  the  details  are  left  to  the  reader.  ■ 
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